36
newly produced DNA radioactively, but also mixed dideoxy adenine triphosphate with the
normal deoxy adenine triphosphate, so that the enzyme always stutters at the adenine and
breaks off with about 1% probability at each adenine. This way, you can then visualize all
the adenines in the sequence after sorting the radiolabeled fragments by size and putting
on a film. If I use other dideoxy nucleotides, I also read the other nucleotides. I can also
replace the radioactivity with nucleotides of different luminosity and use a laser to deter
mine the nucleotides online. All this led to the fact that one could determine the DNA
sequences ever faster, in order to store the sequence flood finally in large computer data
bases. After the sequencing reaction and the separation of the fragments had been minia
turised further and further, the sequencing speed increased further and further so that it is
now possible to read many millions of nucleotides per track and process many tracks
simultaneously. In order to determine the genome sequence, the DNA of an organism is
first chopped up (“shotgun” method) and then all these small pieces are sequenced simul
taneously at lightning speed. However, this makes another task more and more difficult,
namely to put the many sequence snippets together in the right way, i.e. to determine the
genome sequence correctly from the snippets found by putting them together (“mapping”
and “assembly” of the genome sequence). In particular, regions in which sequences are
repeated again and again (repeat regions) are difficult to represent correctly in terms of
their length and number of repeats.
3.1
For the other parts of the genome sequence, which do not reveal their function so easily
by high similarity, one has to analyse them in more detail. Here, machine learning and
artificial intelligence methods (Chap. 14) help to understand the sequence. For example,
2008
3 Genomes: Molecular Maps of Living Organisms